Statistical Inference for Big Data Problems in Molecular Biophysics
نویسندگان
چکیده
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technological and algorithmic improvements in computation have brought molecular simulations to the forefront of techniques applied to investigating the basis of living systems. These longer, increasingly complex simulations are reaching petabyte scales. While these simulations promise important insights into the mechanisms of bio-molecular function, teasing out the important information that provide insights into the atomistic-scale behavior, has now become a true challenge on its own. Mining this data for important and biologically relevant patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mechanistic basis of cellular homeostasis. 1 Molecular Biophysics Biological macromolecules such as proteins, de-oxy/ribose nucleic acid (DNA/RNA), carbohydrates and lipids play a diverse role in regulating cellular functions, and thus are easential to sustain life. In order to investigate and understand the mechanistic basis of biomolecular function, biophysicists, over the last 30 years, have taken advantage of the advances in computing power to run increasingly detailed simulations of these biomolecules. In particular, much attention has been paid to the simulation of proteins, which are often considered the workhorses of a cell, and made up of long polymers of amino-acid residues, and fold into three-dimensional structures to perform their function. Biological functions are controlled by the dynamical interactions between various biomolecules and can occur at multiple time-scales from femto-seconds up to micro-, milli-, seconds and beyond, spanning more than 15 orders of magnitude between them. In this paper, we focus on fully-atomistic simulations of proteins/biomolecules in solution as they best represent the cellular environment. Molecular dynamics (MD) simulations provide insights into the dependence of biological function on interactions at multiple length and time scales. MD simulations are governed by a potential energy function that includes both bonded and non-bonded interaction terms. The gradient of the energy function defines a force-field which is then applied to every atom in the molecule. At each time step, Newton’s laws of motion are integrated to generate a trajectory. A time-step on the order of a femtosecond (10−15s) is necessary for capturing the smallest vibrations of interest and to ensure numerical stability for integrating the equations of motions as simulations progress. However, biologically interesting events (related to bimolecular function) typically occur at microsecond (10−6s) and higher time scales. With improvements in sampling
منابع مشابه
Statistical challenges in nanoscale biophysics
Recent advances in nanotechnology allow scientists to follow a biological process on a single molecule basis. These advances also raise many challenging stochastic modeling and statistical inference problems. First, by zooming in on single molecules, recent nanoscale experiments reveal that some classical stochastic models derived from oversimplified assumptions are no longer valid. Second, the...
متن کاملTenth Meeting of New Researchers in Statistics and Probability
Recent advances in nanotechnology allow scientists to follow a biological process on a single-molecule basis. These advances also raise many challenging stochastic modeling and statistical inference problems. First, by zooming in on single molecules, recent nano-scale experiments reveal that some classical stochastic models derived from oversimplified assumptions are no longer valid. Second, th...
متن کاملSome inverse problems in biophysics
During the past few years the development of experimental techniques has allowed the quantitative analysis of biological systems ranging from neurobiology and molecular biology. This work focuses on the quantitative description of these systems by means of theoretical and numerical tools ranging from statistical physics to probability theory. This dissertation is divided in three parts, each of...
متن کاملInverse statistical problems: from the inverse Ising problem to data science
Inverse problems in statistical physics are motivated by the challenges of ‘big data’ in different fields, in particular high-throughput experiments in biology. In inverse problems, the usual procedure of statistical physics needs to be reversed: Instead of calculating observables on the basis of model parameters, we seek to infer parameters of a model based on observations. In this review, we ...
متن کاملHow big data changes statistical machine learning
This presentation illustrates how big data forces change on algorithmic techniques and the goals of machine learning, bringing along challenges and opportunities. 1. The theoretical foundations of statistical machine learning traditionally assume that training data is scarce. If one assumes instead that data is abundant and that the bottleneck is the computation time, stochastic algorithms with...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012